K-means and Graph-based Approaches for Chinese Word Sense Induction Task

نویسندگان

  • Lisha Wang
  • Yanzhao Dou
  • Xiaoling Sun
  • Hongfei Lin
چکیده

This paper details our experiments carried out at Word Sense Induction task. For the foreign language (especially English), there have been many studies of word sense induction (WSI), and the approaches and the techniques are more and more mature. However, the study of Chinese WSI is just getting started, and there has not been a better way to solve the problems encountered. WSI can be divided into two categories: supervised manner and unsupervised manner. But in the light of the high cost of supervised manner, we introduce novel solutions to automatic and unsupervised WSI. In this paper, we propose two different systems. The first one is called K-means-based Chinese word sense induction in an unsupervised manner while the second one is graph-based Chinese word sense induction. In the experiments, the first system has achieved a 0.7729 Fscore on average while the second one has achieved a 0.6067 Fscore.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Word Sense Induction with Basic Clustering Algorithms

Word Sense Induction (WSI) is an important topic in natural langage processing area. For the bakeoff task Chinese Word Sense Induction (CWSI), this paper proposes two systems using basic clustering algorithms, k-means and agglomerative clustering. Experimental results show that k-means achieves a better performance. Based only on the data provided by the task organizers, the two systems get FSc...

متن کامل

LSTC System for Chinese Word Sense Induction

This paper presents the Chinese word sense Induction system of Leshan Teachers’ College. The system participates in the Chinese word sense Induction of task 4 in Back offs organized by the Chinese Information Processing Society of China (CIPS) and SIGHAN. The system extracts neighbor words and their POSs centered in the target words and selected the best one of four cluster algorithms: Simple K...

متن کامل

Improving Word Sense Induction by Exploiting Semantic Relevance

Word Sense Induction (WSI) is the task of automatically inducing the different senses of a target word from unannotated text. Traditional approaches based on the vector space model (VSM) represent each context of a target word as a vector of selected features (e.g. the words occurring in the context). These approaches assume that the words occurring in the context are independent and do not exp...

متن کامل

ISCAS: A System for Chinese Word Sense Induction Based on K-means Algorithm

This paper presents an unsupervised method for automatic Chinese word sense induction. The algorithm is based on clustering the similar words according to the contexts in which they occur. First, the target word which needs to be disambiguated is represented as the vector of its contexts. Then, reconstruct the matrix constituted by the vectors of target words through singular value decompositio...

متن کامل

Word Sense Induction by Community Detection

Word Sense Induction (WSI) is an unsupervised approach for learning the multiple senses of a word. Graph-based approaches to WSI frequently represent word co-occurrence as a graph and use the statistical properties of the graph to identify the senses. We reinterpret graph-based WSI as community detection, a well studied problem in network science. The relations in the co-occurrence graph give r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010